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The barotropic model is used to explore the advantages of parallel processing in deterministic forecast- 
ing. We apply this model to the trade forecasting of hurricane Elena (1985). 

In this particular application, solutions to systems of elliptic equations are the essence of the computa- 
tional mechanics. One set of equations is associated with the decomposition of the wind into irrotational 
and nondivergent components— this determines the initial nondivergent state. Another set is associated 
with recovery of the streamfunction from the forecasted vorticity. We demonstrate that direct parallel 
methods based on accelerated block cyclic reduction (BCR) significantly reduce the computational time 
required to solve the elliptic equations germane to this decomposition and forecast problem. A 72-h track 
prediction was made using incremental time steps of 16 min on a network of 3000 grid points nominally 
separated by 100 km. The prediction took 30 sec on the 8-processor Alliant FX/8 computer. This was a 
speed-up of 3.7 when compared to the one-processor version 

The 72-h prediction of Elena’s track was made as the storm moved toward Florida’s west coast Ap- 
proximately 200 km west of Ihmpa Bay, Elena executed a dramatic recurvature that ultimately changed 
its course toward the northwest Although the barotropic track forecast was unable to capture the hur- 
ricane’s tight cycloidal looping maneuver, the suBsequent northwesterly movement was accurately fore- 
casted as was the location and timing of landfall dear Mobile Bay. 
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1. INTRODUCTION 

The use of barotropic models to track hurricanes (typhoons) was initiated in Japan 
during the early 1950s (Sasaki and Miyakoda, 1954). Research continued into the 
1960s in both Japan and the U.SA., where the success of research models eventu- 
ally led to operational implementation at the National Hurricane Center (NHC) in 
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1968 (Sanders et al., 1975). For a history of barotropic modeling as applied to the 
hurricane tracking problem and commentary on its relevance in current research, 
see DeMaria (1985) and Shapiro and Ooyama (1990). 

The sensitivity of barotropic track forecasting to perturbations in the initial con- 
ditions was convincingly demonstrated in the paper by Sanders and Burpee (1968). 
Determination of the initial state is especially difficult due to the scarcity of data 
over the oceans where tropical storms develop. Most observations over the ocean 
are obtained from weather satellites. Typically, these observations are wind esti- 
mates made from a sequence of photographs or images — so-called “cloud-tracked 
winds” or “water-vapor winds” (Velden et al., 1992). Since the barotropic model 
assumes a nondivergent wind, the observations of wind must be decomposed or 
partitioned into nondivergent and irrotalional components. 

In an effort to improve the initial conditions for barotropic forecasting, recent 
efforts have been made to include a time history of wind analysis into the estimate 
of the initial state. To accomplish this task in a dynamically consistent fashion, ad- 
joint methods have been used (Lewis et al., 1987; DeMaria and Jones, 1991). These 
assimilation/initialization methods require a model of comparable complexity to the 
forecast model. The so-called adjoint model is integrated backward in time and is 
used iteratively with the forward model (forecast model) to find an optimal initial 
condition that minimizes the discrepancy between model forecast and observations. 
Although aesthetically pleasing and theoretically attractive, the computational de- 
mands have made this strategy prohibitive in the operational environment With the 
research models tested to date, the computational cost of an adjoint assimilation 
with barotropic models is % 10-20 times as costly as the forecast alone (Lewis and 
Derber, 1985; Talagrand and Courtier, 1987). Realistic implementation of adjoint 
assimilation in operations will require the maturing technology of multicomputer 
and multiprocessors (Lakshmivarahan and Dhall, 1990). We refer to this technology 
as parallel processing or parallel computing. 

Our ultimate aim is to apply parallel computing technology to both the forecast- 
ing and assimilation problem (adjoint assimilation). We concentrate on the forward 
problem in this initial effort, but the transfer of the computational algorithms to 
the backward or adjoint problem should be straightforward. We apply our parallel 
processing mechanics to the track forecast of hurricane Elena (1985). This case is 
especially challenging because of the looping maneuver that the storm executed in 
the northeastern Gulf of Mexico. This case study also benefits from an unusually 
large set of synoptic-scale wind analyses produced at the Space Science and Engi- 
neering Center (University of Wisconsin-Madison). 

In Section 2, following Lynch (1988, 1989), we describe a computational scheme 
for decomposing a windfield into non-divergent and irrotational components. This 
scheme leads to the solution of two related Poisson’s equations. The baratropic 
model itself is described in Section 3. This has two components — one is the time- 
dependent evolution of vorticity and the second is a Poisson’s equation relating the 
stream function and vorticity. Since the solution of Poisson’s equation is central to 
the computations, we use a direct parallel method, called the block cyclic reduction, 
for solving these equations. A summary of this method is given in Appendix A. Sec- 
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tion 4 discusses various issues related to choice of grid, boundary conditions, nature 
and type of discrete approximations, and conditions for stability. Speed-up analy- 
sis resulting from solving this model on the state of the art multivector processor 
Alliant FX/8 is given in Section 5. Concluding statements are given in Section 6. 


2. DECOMPOSITION 


To facilitate our work, we have mapped the earth onto the Mercator projection. 
In the kinematic and dynamics equations that follow, the east-west and north- 
south coordinates — x and y, respectively — are distances on the Mercator plane and 
the image scale factor, m, is the ratio of distance on this projection to the corre- 
sponding distance on earth. This image factor is given by (cos ^o/ cos <p) where <po is 
the latitude of intersection between the Mercator cylinder and the spherical earth 
(<po =« 30° N in our case). Saucier (1955) discusses the geometry of the Mercator 
projection and includes the derivation of the image scale factor. 

Let V denote the horizontal wind vector, x the velocity potential, V the stream 
function, ( the vorticity and 6 the divergence. Let and V x be the nondivergent 
and the irrotational components of this wind field, respectively. Helmholtz’s theo- 
rem permits decomposition of V as follows: 

v--v* + v x , (1) 

where 


V x = k x Vtf, (2) 

V x = V X , (3) 

and V = ( m(d/dx ), m(pjdy )) is the 2 — d gradient operator. To simplify notation, 
the image-scale factor will be assumed to be associated with all differential opera- 
tors, but it will not be explicitly written. Thus, d/dx is in actuality m(d/ dx), (d 2 ]dx 2 ) 
means m(d/dx)(m(dldx)), etc. Let i, j and k be unit vectors in the 2 orthog- 
onal horizontal directions ( x,y ) and the vertical direction (z), respectively. Since 
V ■ = o and V x V x = 0, it follows that 

( = k'VxV = k'Vx(kx V\&) = V 2 ^ (4) 


and 

where 


6 = V • V = V 2 x 


= ( iL + *L) 

\9x 2 dy 2 ) 


( 5 ) 


Since V = iu + jv, then 
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Table 1 Lynch’s Eight Different Variations of Boundary Conditions for Decomposing the Wind Field 
(Dirichlet and Neuman conditions are denoted by D and M, respectively) 


Version 

Number 

First Boundary 
Condition 

Second Boundary 
Condition 

Type of First 
and Second B.G 

1 

x = o 

= f? ( ^ dS 

J So \drt ) 

DD 

2 

II 

* 

* - Stfi- - w s 

ND 

3 

# =0 

m 9 * -V 
m dn ~ Vn 

DN 

4 

d* 

m 'dn =7/ 

dy 

m — = V„ + m — 
drt ds 

NN 

5 

X 

II 

o 

?is 

ii 

DN 

6 

8y 

m Tn = 7 " 

3 

ii 

i 

%\3 

NN 

7 

o 

II 


DD 

8 

d* 

m *r =75 

X = jfo-JsW 

ND 



To extract the nondivergent wind component from the observed wind, we first 
compute the vorticity and divergence which serve as the forcing functions in the so- 
lution to Poisson’s equations, Eqs. (4) and (5). These equations are coupled through 
the boundary condition (b.c.’s) which can assume a variety of forms. As discussed 
by Lynch and collaborators (Bijlsma et al., 1986; Lynch, 1988, 1989), the decompo- 
sition is not unique in a limited domain and a wide variety of results are possible 
that depend on the choice of b.c.’s. Table 1 displays eight different sets of b.c.’s that 
have been tested by Lynch (1988). The first four versions use the normal compo- 
nent (V n h) of the observed wind on the boundary whereas the last four use the 
tangential component ( V s S ). The parameters 

7 "( = ^z/ v " ds ) “ d 

are constant gradient conditions on the boundary that satisfy divergence theorem 
identities — so-called compatibility conditions. Section 2 of Lynch (1988) can be read 
for a more detailed discussion of these boundary condition choices. 

Our choices are narrowed by choosing b.c.’s that minimize the kinetic energy in 
the divergent component (K x ). This is dictated by the desire to minimize the kinetic 
energy in the “largely spurious divergent part of the wind” (Sanders and Burpee, 
1968). As shown by Davies-Jones (1988) and Lynch (1988), x = 0 on the boundary 
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is sufficient to minimize K x and furthermore causes the nonphysical cross-term Ky x 
to vanish (again see Lynch (1988) for details). Version 1 (due to Sangster (I960)) 
and version 5 are, therefore, candidates. Dirichlet b.c.’s are easier to implement 
than Neumann conditions since the solution is unique, i.e., it does not involve an 
arbitrary additive constant, nor is there a compatibility condition. Thus, version 1 is 
preferred. Care must be exercised, however, to insure that 


*a(S) = 



dS 


is single valued; subscript B is used to indicate the value on the boundary. This 
requirement can be satisfied by first solving (5) for x with \b — 0> computing 
dx/dn \ B and adjusting V n (to V„) by a constant amount £ around the boundary 
such that 

j£(£- K -*)"- a 

The final step is to solve (4) with 


'J'b = 



where ^(.Sb) = 0 


by choice. Thus, version 1 has the same desirable properties as version 5 without 
the complexities of a Neumann b.c. The only difference is that version 1 uses V„ 
on the boundary while version 5 uses Vy. There doesn’t appear to be any physical 
reason for preferring V s over V„, or vice versa. 


3. THE BAROTROPIC MODEL 

The approach to barotropic forecasting follows Leith (1978) and DeMaria (1987). 
The prognostic equation is given by 

§-j«+/,n (8) 

where J is the Jacobian and / is the Coriolis parameter. 

In this prognostic equation, ( represents a modified vorticity given by 

C = (V 2 - <r 2 )tf . (9) 

The derivation of (8) is found in Leith (1978) and follows from operations on the 
shallow water equations. As discussed in DeMaria (1987), this form of the equation 
amounts to a divergence-corrected barotropic model which prevents the retrogres- 
sion of ultralong Rossby waves. The parameter a is empirically determined and 
is taken to be l/800(km _1 ) based on a recommendation from DeMaria (personal 
communication). We still use the nondivergent wind to initialize the model. 

Given the wind components (u, v) at each of the grid points, a scheme for com- 
puting the solution of (8) is described as follows: 
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(a) Compute ( and 6 from the initial wind field, then solve (4) and (5) for $ and 
X following the decomposition process described in Section 2. 

(b) Use (8) to step forward and find ( at the next incremental time. The vorticity 
at the boundary is calculated differently, depending on whether there is inflow 
or outflow at the boundary. If outflow, the vorticity is calculated by a one- 
sided finite difference version of (8). If inflow, the vorticity is unchanged. 

(c) Using the forecasted vorticity, (9) is solved for '£. 

(d) Repeat steps (b) and (c) until the entire forecast period is covered. 

Both the decomposition of the wind field and the barotropic forecast hinge on 
the solution to Poisson’s equation. To find the solution to this equation, we apply 
a direct method called block cyclic reduction (BCR). Since the use of this method 
in a parallel processing mode is critical to our work, salient features of this method 
are given in the appendix. For more details, refer to Buzbee, Golub and Nielson 
(1970), Hockney (1965), Lakshmivarahan and Dhall (1991), Swartztrauber (1984), 
and Sweet (1977, 1988). 


4. APPLICATION TO HURRICANE ELENA 

Hurricane Elena (1985) had its origin in a rapidly moving tropical wave that was 
first analyzed over the Saharan Desert (Velden, 1987). It moved over the Atlantic 
Ocean at a rate of ss 15 ms" 1 and appeared as a tropical depression over Cuba on 
August 28th. The depression intensified and was classified as a tropical storm by 
00 UTC on the 29th. It was upgraded to hurricane status 12 hours later when it 
was centered at ~ 25N, 85W. Subsequent positions at 12 h intervals are indicated by 
dots along the solid line in Figure 1. The initial time for our forecast was 12 UTC 
31 August, when the storm occupied the position indicated by ( § ) in Figure 1. 

a. Numerics: All experiments were conducted on the ALLIANT FX/8 using 
double precision arithmetic. Five separate modes of computation were used starting 
with the scalar mode — use of a single processor without the benefit of vectorization. 
This was followed by computation with 1, 2, 4, and 8 processors with vectorization. 
Since the BCR-partial fraction method was more efficient than the polynomial fac- 
torization form of BCR (Jwo et al., 1992), all experiments used the partial fraction 
approach (see Appendix). 

The equations are solved on the staggered mesh of grid points displayed in Fig- 
ure 2. This staggering is in accord with Arakawa’s C-grid convention (Arakawa, 
1966). The finite-difference approximation to the Jacobian conserves the vorticity, 
the kinetic energy, and the squared vorticity. 

Following DeMaria (1985, 1987), we adopted the Adams-Bashford scheme for 
integrating the model (Anderson et al., 1984). As suggested by DeMaria (personal 
communication), a reasonable limitation on the time-step is given by 

A < A 

1 ~ 2mV2(u 2 + v 2 )^ 


(25 min), 
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Figure 1 The observed and forecasted tracks of Elena (1985). The location of Elena at the initial time 
(r = 0, 12 UTC Aug. 31) for barotropic forecasts is indicated by ( ). The numbers 1, 2, and 3 indicate the 
location of Elena at 1, 2, and 3 days after t = 0. Our forecast is termed barotropic and the operational 
model is SANBAR. 



i 

i 

i 

i 


A 


* 


fe to 


Figure 2 The lower right hand corner of the staggered grid used in our study. The boundary values are 
denoted by the use of subscript ‘b\ 


where max signifies the largest wind speed, A and A, are the space and time incre- 
ments, and m is the image scale factor at the location of the maximum speed. As 
indicated in parentheses, the limiting A, is 25 min for our grid and wind field. A 
time step of 16 minutes was used in all of our computations. 
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Prior to the forecast, an artificial axisymmetric vortex — representing the hurri- 
cane circulation — is superimposed on the $ -field found from the decomposition 
process. This helps us locate the hurricane at subsequent times. The vortex remains 
intact through the 72 h forecast period, but there is a modest amount of numerical 
smoothing that weakens the vortex. The mathematical form of the vortex is: 

V " (r) = V "fe)“ p {H 1 -(^)1} and <*'>-? TrW’ 

( 10 ) 

where \ H is the tangential wind and Qj the associated vorticity. The parameters in 
the wind equation are: V m (the ma ximum tangential wind), r (the distance from the 
storm center), r m (the radius of maximum wind), and b (an empirical parameter that 
determines the rate at which V# decreases at large radii). We have used DeMaria’s 
values for these parameters to typify observed Atlantic tropical storms (DeMaria, 
1987). These values are V m = 25 ms -1 and r m = 150 km, and the parameter b is 
chosen such that V# = 5 ms -1 at 600 km (this gives a value of b = 0.998). 

b. Track forecast: Barotropic forecasting for a particular level in the tropical 
atmosphere has been unsuccessful for a variety of reasons that have been enumer- 
ated in Sanders and Burpee (1968). They determined "... that forecasting should 
be carried out for the tropospheric mean flow ... rather than by selection of a 
single, presumably representative level.” This work was the foundation of the op- 
erational model developed at NHC (Sanders et al., 1975). The operational model 
has become known as the Sanders Barotropic model, or SANBAR in acronymic 
form. Following the work of Sanders and Burpee (1968) and Sanders et al. (1975), 
we calculate a mean wind throughout the troposphere using satellite-derived winds 
and radiosondes following the procedures established at the Space Science and En- 
gineering Center in Madison, Wisconsin (UW-SSEC) (Velden et al„ 1984). In sub- 
sequent discussion, we refer to these winds as deep-layer-mean (DLM) winds. 

Figure 3 shows the DLM wind field at t = 0 (12 UTC 31 Aug.) plotted over 
our analysis and forecast domain. A well-defined trough that extends from New 
England down through the northeast comer of the Gulf of Mexico is apparent in 
the DLM wind vectors. The initial streamfunction field associated with the DLM 
wind is shown in Figure 4. The artificial hurricane vortex described in Section 4a 
has been superimposed on this large-scale $ field. The forecasted stream function 
fields at t = 72 h is shown in Figure 5. The path of the hurricane over the three-day 
period has been superimposed on the 72 h forecast of streamfunction. 

We estimated Elena’s position in our forecasts by finding the center of the imbed- 
ded vortex at 12 h intervals. This is done by finding the minimum value or the 
maximum £ value. We found insignificant difference in positioning Elena by ei- 
ther method and the forecasted positions at 12 h intervals are displayed in Fig- 
ure 1. The observed track as well as the forecasted track from SANBAR are also 
shown in this figure. The observed tight cycloidal motion (re-curvature) between 
t = 0 and t = 24 h was not captured by our model forecast This small scale feature 
in the trajectory is near the limit of the model resolution, i.e., the area outlined 
by this “teardrop” shaped part of the trajectory is « (0.5° lat) 2 = (50 km) 2 . A grid 
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Figure 3 Distribution of deep-layer-mean winds (DLM) at t = 0 (12 UTC, 31 Aug 85). The wind speed 
is scaled by the largest speed which is shown in the lower right hand corner (« 30 ms -1 ). Elena’s posi- 
tion at this time is indicated by ( ). 


8^'P 


Figure 4 Streamfimction associated with the DLM winds and imbedded vortex at t = 0. The contours 
of the streamfunction are drawn at intervals of 10 6 m 2 s -1 . 
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Figure 5 Same as Figure 4 except map is valid at t = 72 h. The observed track of Elena between 0-72 h 
has been superimposed. 


cell in the model is « (100 km) 2 . Although the forecasted location of landfall was 
« 150 km east of the observed location (near Mobile Bay), the time for landfall was 
accurate. The positioning error was less than the average error for 48 h predictions 
(NHC, personal communication). 

The parallel-barotropic forecast was an improvement over SANBAR because it 
did not use the “initial storm motion” (ISM) vector as part of the forecast. It has 
been found that operational forecasts with SANBAR generally benefit from the 
inclusion of an ISM vector based on the storm’s previous path. In cases exhibit- 
ing re-cnrvature, however, incorporation of recent history (persistence) is generally 
detrimental to the track forecast. Since re-curvature is difficult to anticipate and 
since it is a relatively rare event, the operational forecasts always include the ISM 
vector in the initial condition. Examination of the forecast from SANBAR indicates 
that the imposition of the ISM constraint at t = 0 forced Elena to erroneously move 
over the northern portion of the Florida peninsula during the early part of the 72 h 
forecast. 

5. SPEED-UP ANALYSIS 

Table 2 su mmar izes the results for our barotropic forecast using the Alliant multi- 
vector processor. This table illustrates two types of speed-up computations. The 
speed-up ratio 1 is the ratio with respect to the serial time. Using one vector pro- 
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Tbble 2 Speed-Up Ratios for 72 h Forecasting Process 


Number of Processors 

Elapsed Time (Sec) 

Speed-Up Ratio 1 

Speed-Up Ratio 2 

Serial 

216.18 

1.0 

— 

1 Vector 

109.47 

1.97 

1.0 

2 Vector 

61.03 

3.54 

1.79 

4 Vector 

38.82 

557 

2.82 

8 Vector 

29.53 

732 

3.71 


cessor we obtain a speed-up of nearly 2, using 8 vector processors a speed-up of 
7.32. In the latter case, there are two factors contributing to the speed-up, namely, 
parallelism (8 processors) and vectorization in each of the eight processors. In con- 
trast, the speed-up ratio 2 measures the speed-up resulting from multivectors to 
single vector processing. Thus, eight vector processors result in a speed-up ratio of 
3.71. Since vectorization is a common factor in the second ratio, the effect of paral- 
lelism on speed-up is isolated. 

Referring to the results in Table 2, one might wonder why a speed-up ratio of 
« 3.7 is achieved when 8 processors are used. In practice, a number of factors af- 
fect the speed-up achievable. First, the algorithm itself is a factor. It could happen 
that there are segments of the computation that are intrinsically serial and cannot 
be parallelized at all. This will lead to loss of speed-up as dictated by the well- 
known Amdahl’s law (Lakshmivarahan and Dhall, 1990). Referring to the model 
equations (8) and (9), it can be verified that (refer to Appendix A), while BCR 
method for solving (9) exhibits “good” parallelism, the degree of parallelism does 
not remain constant, which is an intrinsic property of this class of algorithms. In- 
deed, the degree of parallelism decreases during the course of the reduction phase, 
and then increases in the back-substitution phase. In other words, the processor uti- 
lization decreases and increases during the reduction and back-substitution phases 
respectively. Secondly, stability of numerical integration dictates the use of Adams- 
Bashford scheme in combination with Arakawa’s approximation for the Jacobian. 
However, the computational process resulting from the use of these schemes do not 
readily vectorize and/or parallelize. These factors limit the speed-up that is achiev- 
able. 

Second, the machine architecture is a factor. In the Alliant FX/8, for example, 
the processors are connected to the main memory through a fast cache memory. 
The size of this cache limits the data transfer rate between the memory and the 
“thirsty” processors. Thus, when all eight processors are concurrently working in 
the vector mode, the data needed by these processors may not be readily available 
in cache memory which may affect the data transfer rate between memory and the 
processors. Thus, in such a situation, increase in number of processors results in 
diminishing returns in terms of the speed-up. Increasing the cache size will only 
postpone this saturation effect, but on architectures with no cache memory, such as 
the CRAY research machines, linear increase in speed-up can be achieved. 

Thirdly, when one increases the number of processors, unless the problem size is 
also increased, the given quantum of work is merely smeared across a large num- 
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ber of processors. Oftentimes it may take more time to distribute the data to the 
processors than to compute the results. This contributes to idling of processors and 
in turn reduces the processor efficiency, which is speed-up per processor. Since the 
data set available on Elena was fixed a priori, increasing the number of processors 
would only result in diminishing returns as evidenced in Table 2. Thus, success in 
parallel computing critically depends on the proper match between the algorithm, 
the architecture, and the problem size. 

6. CONCLUSION 

We have taken the simplest deterministic weather prediction model that is still used 
operationally — the barotropic model — and have brought the maturing technology 
of parallel processing to bear on its solution. We have applied the computational 
mechanics to the problem of tracking hurricane Elena (1985). 

The results are encouraging from both a pragmatic computational viewpoint as 
well as a practical weather forecasting viewpoint. A significant saving of compu- 
tational resources is possible with parallel processing. Essential to the time-saving 
is the use of parallel direct methods for solving the elliptic equations associated 
with the barotropic forecast We have found that the block cyclic reduction (BCR) 
method using partial fraction expansion is an efficient strategy. As a measure of the 
efficiency, we calculated the speed-up ratios with respect to both the serial mode 
and the one vector processor mode and found that a speed-up ratio of 3-4 is possi- 
ble when an 8 vector processor is compared to a single vector machine. 

A complete set of DLM winds for Elena (1985) were derived at the UW-SSEC 
and used as input to the barotropic model. A decomposition of these winds was 
accomplished in such a way that the kinetic energy in the nondivergent component 
was maximized. The streamfunction extracted from the decomposition process was 
used to initialize the barotropic model and application was made to the period of 
time when Elena executed a dramatic re-curvature in the eastern Gulf of Mexico. 
Although the model did not capture the small scale cycloidal motion associated 
with the abrupt change in Elena’s path, the larger-scale redirection from easterly 
to northwesterly was well handled. Especially encouraging was an accurate 48 h 
prediction of landfall. 

Based on this success, we are prepared to address the problem of coding the 
adjoint model that can be used in model sensitivity studies and data assimilation. 
Since the adjoint equations also hinge on the solution to Poisson’s equation, the 
implementation should proceed expeditiously. 

Appendix. Solution to Poisson’s Equation Using a Parallel Processing Version of 
Block Cyclic Reduction (BCR) 

BCR methods were first discussed by Hockney (1965) and were further developed 
by Buzbee et al. (1970). However, applicability to large-scale problems was ham- 
pered by the presence of serial bottleneck which inhibited the exploitation of full 
parallelism (Lakshmivarahan and Dhall, 1990). Only recently have methods been 
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developed to overcome this bottleneck (Sweet, 1988; Gallopoulos and Saad, 1989). 
The classical BCR, when combined with acceleration schemes, gives rise to an algo- 
rithm which holds promise for both parallelism and vectorization (Jwo et ah, 1992). 
We have now developed codes for the BCR method that include two basic options: 
polynomial factorization and the partial fraction expansion. Both versions can ac- 
commodate three types of boundary conditions: Dirichlet, Neumann, or periodic. 
They can also handle an arbitrary M xN grid as well as special grids with M 
(or N) ~ 2 k - 1, k an integer (Sweet, 1977; Jwo et al., 1992). Poisson’s equation 
is discretized on a uniform Mercator grid (100 km spacing at latitude 30°N). In the 
experiments, variables on the staggered grid have the following dimensions ( M,N ): 
u(65,43), v(64,44), $(65,44) and *(66, 45) where M is the index in the east-west 
direction and N is the corresponding index in the north-south direction. 

Poisson’s equation on an M x N grid is represented by 

Ax = b, (A.1) 

where A is a sparse symmetric matrix. When Dirichlet boundary conditions are used, 
A assumes the form: 



B 

c 

0 

0 • 

• 0 

0 

O' 


C 

B 

c 

0 • 

• 0 

0 

0 

A = 

0 

C 

B 
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0 
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0 

0 • 

c 

B 
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.0 

0 

0 

0 . 

• 0 

c 

B 


where there are ( N - 2) block matrices B along the principal diagonal of A, and B 


is a tridiagonal matrix of order (M 

— 2) x ( M — 2). The form of B is: 


’—4 

1 

0 0 

• • 0 

0 

O' 



1 

-4 

1 0 

• • 0 

0 

0 


B = 

0 

1 

-4 1 

• • 0 

0 

0 

(A.3) 


0 

0 

0 0 

• • 1 

-4 

1 



0 

0 

0 0 

• • 0 

1 

-4, 


and C is the identity matrix of 

order [M - 2) x (Af - 

2). 




We shall illustrate block cyclic reduction in the case where A is a 7 x 7 block 
tridiagonal matrix. The system Ax = b can be rewritten as 

Bxi + X 2 = bi 

x,_i + Bx, + x,+i = b, 2 < i < 6 (A.4) 


X6 + X 7 = b7. 
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Consider the sets of three consecutive equations centered around x<, for i — 2 , 4 
and 6 

x ,_ 2 + Bxi-ix, = b 'I 

Xi-iBx, + Xj+i = bi i i = 2,4,6, (A.5) 

X, + BXj +1 + Xi +2 = b;+i J 

where xo = x« = 0 . 

Multiplying the second equation by -B and adding the three equations of the 
set, we eliminate the odd indexed terms Xi_i and Xi+i. These operations can be 
accomplished in parallel for i — 2, 4 and 6 , and we obtain 

Xi — 2 + (21 - B 2 )x J + X |+2 = b;+i + b,_i - Bb,, i = 2,4,6. (A. 6 ) 

This constitutes the first step of the reduction process and results in the split of 
the 7x7 system into two subsystems — one for odd indexed x,’s called the eliminated 
system and the other for even indexed x,-’s called the reduced system 

B(1)x 2 + X 4 = b 2 

X 2 + B(1)X4 + X 6 = b4 (A.7) 

X 4 + B(1)X6 = b 6 

is the reduced system where B(l) = 21 - B(0) 2 , (B(0) = B), which is a matrix poly- 
nomial in B of degree 2 and b,(l) = b,+i +b,_i -B(0)b,, for i- 2, 4 and 6 . The 
eliminated system is given by 

'B(O) 0 0 0 - 

0 B(0) 0 0 

0 0 B(0) 0 

.0 0 0 B(0). 

Thus solving (A.7) for b 2 , b 4 , and b 6 , we can readily solve for xi, X 3 , X 5 and X 7 from 
the eliminated system. Notice (A.7) is again a block tridiagonal system and we can 
once again multiply the second equation in (A.7) by -B(l) and add the equations 
together to eliminate X 2 and X 6 . The new reduced system is 

B(2)X4 = b 4 (2), (A.9) 

where 

B(2) = 21- [B(l)] 2 , 

which is a matrix polynomial in B of degree 4 and 

b 4 (2) = b 2 (l) + b 6 (l)-B(l)b 4 (l). 

The new eliminated system is 


[B(l) 

0 

'X 2 ' 


’b2(l)-X4' 

O 

B(l). 

,X6. 


,b 6 (l)-X4. 


'Xl 


Dl-X 2 

x 3 


b? — x 2 - X4 

X5 


bs — X 4 - X6 

-X7. 


,b7 — X6 


(A.8) 


(A. 10) 
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1 2 3 4 5 6 7 



Figure 6 Pictorial display of the block cyclic reduction (BCR) process for a linear system of 7 equations 
where each node in the diagram represents an equation in one of the variables: X\,Xi,...,Xz. 


This algorithm is diagrammalically illustrated in Figure 6. It consists of two 
phases. In the first step of the reduction phase, eliminate xi, X3, X5, and X7 in parallel 
to get a reduced system in X2, X4, and x<$. In the second step eliminate X2 and X6 in 
parallel to obtain the reduced system in X4. By solving this system we can recover 
other unknowns in the back-substitution phase. Knowing X4, X2 and X6 are recovered 
first in parallel using (A.10) and then from (A.8), xi, X3, X5, and are recovered in 
parallel. 

Computationally, the back substitution involves solving 


B(2)X4 = b 4 (2) 

(A.11) 

B(l)x l = b,(l) - X4, for i = 2,6, 

(A. 12) 

B(0)x, = b, — Xj+i — x,_i 

(A- 13) 


for i = 1, 3, 5, and 7, where xo = xg = 0. 

Consider the system in (A.11). From B(2) = 21- [B(l)] 2 = 21 — (21 - [B(0)] 2 ) 2 , it 
follows that computation of B(2) involves expensive matrix multiplications. Besides, 
while B(0) is tridiagonal, [B(0)] 2 is not tridiagonal. Thus B(l) and B(2) do not in- 
herit the sparse structure of B(0). Thus solving (A. 11) by directly computing B(2) 
is not computationally worthwhile. An alternate approach is to exploit the structure 
of the matrix polynomial and to cleverly factor it so that each factor inherits the 
tridiagonal structure of B which can then be exploited in computation. To this end, 
define a class of polynomials recursively as follows: 

P 2 ^{a,t) = 2f r+1 - [p 2 '{a,t)f (A. 14) 

where 

P 2 (a,t)= It 1 -a 2 . 

pzr(a,l) is known as the Chebyshev polynomial of the first kind and degree 2 r . The 
importance of this class of polynomials stems from the fact that 

W) = pzrfai) |a=B,/=I • 
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Thus, B(l) = 21 - B 2 . By setting a = -2tcos9, it can be shown that 

P 2 '(a,t) = 2 1 1 cos 2 r 9. 


Clearly, 


P2'(a,t) = 0, 


if 


Using this we obtain a factorization 


T 


where 


p 2 r(a,t) = - fj(a + 2t + cos 9 } (r)), 
> i 


The corresponding factorization for B(r) is given by 

T 

y-i 


where 

H ; (r) = (B + 2Icos^-(r)). 

Notice Hy(r) is tridiagonal since B is tridiagonal and 21 cos ^(r) is diagonal. 

Given this factorization, there are two possibilities for solving systems of the type 

B(r)y = d for r = 0,1,2, We illustrate this using (A. 11). 

Recall 

B(2) = -Hi(2)H 2 (2)H 3 (2)H4(2). 

Let Zo = b4(2). Then (A. 11) becomes 

H 1 (2)H 2 (2)H 3 (2)H 4 (2)X4 = Zo. 

Solve 

H,(2)Zj = Zj_! 


successively for * = 1, 2, 3 and 4. Clearly X4 = Z r . The advantage of this approach 
called the polynomial factorization is that each H, (2) is sparse. Yet the collection of 
systems is solved sequentially which inhibits parallelism. 

In search of a method that admits more parallelism, rewrite (A. 11) as 

X4 = [B(2)]" 1 b 4 (2) 

= [P2'(a,t)]-± B4=l b A (2). 

Expressing [P2'(a,t)]~ l in partial fraction expansion we obtain 

X4 = [«iHf x (2) + a 2 H^(2) + a 3 H 3 - 1 (2) + a 4 H 4 - 1 (2)]b 4 (2). 
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In this expression each of the terms on the right hand side can be independently 
computed in parallel. Swarztrauber (1984) and Lakshmivarahan and Dhall (1990) 
can be consulted for further elaboration on the steps involved in parallelization of 
this algorithm. 
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